View ingested file content

View Ingested File Content and Artifacts

Learn how to access and review the content of your ingested files, including processed chunks, generated SQL tables, and various artifacts stored in the Binder within the Airia platform. The binder provides enhanced knowledge retrieval – full text, images, additional metadata and pages are now captured per file and can be retrieved with a native Airia tool so that the LLM can expand the variety of questions to answer and can provide more accurate and context-rich outputs.

How Airia Processes Your Data

Before you can view ingested content, it’s important to understand Airia’s sophisticated multi-stage ingestion pipeline. This process transforms raw documents into AI-ready knowledge, enabling powerful retrieval-augmented generation (RAG) for your AI agents. The pipeline includes:

1. Document Parsing and Chunking

Files are identified, parsed, and broken down into smaller, manageable pieces called chunks.

2. Vector Embeddings

Each chunk is transformed into a vector embedding, a numerical representation that captures its semantic meaning, making it efficiently searchable.

3. Image Analysis (Optional)

If enabled, Airia detects and analyzes images within documents, generating AI descriptions. This enhances document understanding and makes visual content searchable.

4. Text-to-SQL (for CSV and Excel)

For CSV and Excel files, Airia offers Text-to-SQL capability, transforming the file into a searchable SQL table.

5. Artifact Generation (The Binder)

During processing, Airia generates various artifacts beyond semantic embeddings, such as the full text of the document, extracted images, and their descriptions. These artifacts are stored in the Binder.

What is the Binder?

The Binder is a collection of artifacts generated during the ingestion process. It provides AI agents with a deeper understanding of your files, allowing them to answer complex questions beyond just semantic meaning (e.g., “How many pages?”, “How many images?”, “What is on the image of page 3?”). These artifacts enable dynamic retrieval by the Agent when needed. Binder artifacts can include:

fulltext.md: The full text content of the document.
pages/: Individual page content (for multi-page documents like PDFs).
images/: Extracted images and their AI-generated descriptions.
SQL Tables: For CSV and Excel files processed with Text-to-SQL.

Access Ingested File Content

Follow these steps to view the processed content and artifacts for your ingested files:

Navigate to the Data Sources tab in the Airia platform.
Locate the desired data source and click on it to view the list of ingested files.
In the file list, find the specific file you want to examine.
Click on the file name or select the View content option (if available) to open its detailed view.
Within the file’s detail view, you will find several tabs:
- Chunks: Displays the list of generated text chunks from the document.
- SQL: (If applicable) Shows the SQL table generated for files processed with Text-to-SQL.
- Binder: Presents a list of all generated artifacts for that specific file.

Supported Binder Artifacts

The availability of specific artifacts in the Binder depends on the file type and the parser used during ingestion.

File Type / Parser	`fulltext.md`	`pages/`	`images/`
PDF
Basic	✅	✅	✅
Advanced	✅	✅	✅
Universal	✅	✅	❌
Intelligent	✅	✅	✅
TXT	✅	❌	❌
MS Office (docx, pptx, etc.)	✅	❌	✅
Excel	✅	❌	❌
Images (png, jpg, tiff, etc.)
Basic	✅	✅	✅
Advanced	✅	✅	✅
Universal	✅	✅	✅
Intelligent	✅	✅	✅
Confluence Page	✅	❌	✅
Notion Page	✅	❌	✅
Email (Sendgrid, Outlook)	✅	❌	❌
YAML	✅	❌	❌
XML	✅	❌	❌
CSV	✅	❌	❌
ServiceNow	✅	❌	❌

Utilize Binder Knowledge with AI Agents

To enable your AI agents to access and leverage the detailed information within the Binder, you need to configure specific tools in your project.

Navigate to the MCP & Tools tab in your Airia project.
Click the Add new tool button.
Search for and select the following tools:
- Binder Content Retrieval
- List artifacts in Binder
- List folders in Binder

💡 Note: When this tool is used to retrieve images, the tool response will be added to the LLM context as content of type ‘image’. This allows multi-modal LLMs to process the image natively and answer questions about visual content.

Add these tools to your project. No specific configuration is required for the tools themselves.
Attach these tools to your Language Model (LLM).

💡 Note: The binder ID corresponds to the document’s file ID in the data store. Binder tools accept a fileId parameter, which can be obtained from the chunk metadata returned by either the data search step or the Datastore Semantic and Keyword Search tool.

Overview

Tools

OAuth App Registration

Data Source Connectors

Cloud Storage

Business Applications

Microsoft Office

Other Sources

View ingested file content